Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Contents

About the Authors

Introduction

1.1

Principal Methods

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1

Early Binary Neural Networks

. . . . . . . . . . . . . . . . . . . . .

1.1.2

Gradient Approximation . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.3

Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.4

Structural Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.5

Loss Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.6

Neural Architecture Search . . . . . . . . . . . . . . . . . . . . . . .

1.1.7

Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.1

Image Classiﬁcation

. . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.2

Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.3

Object Detection and Tracking . . . . . . . . . . . . . . . . . . . . .

1.2.4

Applications

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3

Our Works on BNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Quantization of Neural Networks

2.1

Overview of Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.1

Uniform and Non-Uniform Quantization . . . . . . . . . . . . . . . .

2.1.2

Symmetric and Asymmetric Quantization . . . . . . . . . . . . . . .

2.2

LSQ: Learned Step Size Quantization

. . . . . . . . . . . . . . . . . . . . .

2.2.1

Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2.2

Step Size Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2.3

Step Size Gradient Scale . . . . . . . . . . . . . . . . . . . . . . . . .

2.2.4

Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3

Q-ViT: Accurate and Fully Quantized Low-Bit Vision Transformer . . . . .

2.3.1

Baseline of Fully Quantized ViT

. . . . . . . . . . . . . . . . . . . .

2.3.2

Performance Degeneration of Fully Quantized ViT Baseline . . . . .

2.3.3

Information Rectiﬁcation in Q-Attention . . . . . . . . . . . . . . . .

2.3.4

Distribution Guided Distillation Through Attention

. . . . . . . . .

2.3.5

Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.4

Q-DETR: An Eﬃcient Low-Bit Quantized Detection Transformer . . . . . .

2.4.1

Quantized DETR Baseline . . . . . . . . . . . . . . . . . . . . . . . .

2.4.2

Challenge Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.4.3

Information Bottleneck of Q-DETR

. . . . . . . . . . . . . . . . . .

2.4.4

Distribution Rectiﬁcation Distillation

. . . . . . . . . . . . . . . . .

2.4.5

Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii